Community Resource Index

Objectives: Identify the census tracts with the most in need of resources based on educational data
Scope: 4 major counties in DFW area (Dallas, Collin, Denton, and Tarrant)

Click here to see the index on map

“Morally and economically, allowing so many of our children to grow up so far from opportunity threatens our future. If we are willing to embrace the challenge of working collectively and strategically, Dallas can cut childhood poverty in half within a single generation.” —Mayor Mike Rawlings
Child poverty is a serious problem, children living in poverty are more likely to experience repeated trauma than other peers, then these traumas are likely to lead to lower opportunity. This cycle is hard to break if we don’t get the resources to the right families. The goal of this project is to build an index to improve resource allocation, and prioritize and deploy the funding to the right agencies or schools.
The index is comprised of five categories, community, economics, education, health, and family. In this project, we focused on education.

In education sub-index, we decide to use 11 features and 4 target variables as our indicators, where features include early education enrollment, school poverty, student-teacher ratio, free lunch, reduced lunch, title I school, high-quality ECE centers, math proficiency, reading proficiency, high school graduation rate, second education, and target variables are household income, mental health, physical health, and poverty probability index.

First, we apply KNN imputer to handle missing values. Instead of inserting missing values all at once, we split the data by county, then imputed the missing values separately, because we were told by the nonprofit organizations we cooperated with that the situation for each county was very different. Then we scaled data (MinMaxScaler) for all variables to ensure they are comparable. On more thing, we reversed three target variables (mental health, physical health, and poverty probability index) by multiplying with -1 to ensure the higher score means the better in the target variables.
After completing data preprocessing, we decided to use simple linear regression model to avoid the multicollinearity problem between independent variables, we obtained coefficients by regressing each target variable with independent variables.

The following table shows the results and weights for each features:

Features	Household Income	Mental Health - Reversed	Physical Health - Reversed	Poverty Probability Index - Reversed	Average	Average-Scaled	Wj	Weights
Early Education Enrollment	0.17	0.27	0.25	0.14	0.21	6.44	3.72	0.34
School Poverty	0.26	0.39	0.35	0.23	0.31	9.59	5.29	0.48
Student-Teacher Ratio	-0.12	-0.35	-0.32	-0.15	-0.23	-7.26	-3.13	-0.29
Free Lunch	-0.27	-0.41	-0.44	-0.29	-0.35	-10.94	-4.97	-0.45
Reduced Lunch	-0.11	-0.07	-0.02	0.05	-0.04	-1.20	-0.10	-0.01
Title I School	-0.16	-0.21	-0.22	-0.15	-0.19	-5.74	-2.37	-0.22
High-Quality ECE Centers	0.07	-0.19	-0.08	-0.40	-0.15	-4.71	-1.86	-0.17
Math Proficiency	0.37	0.51	0.51	0.29	0.42	13.09	7.05	0.64
Reading Proficiency	0.37	0.53	0.54	0.33	0.44	13.76	7.38	0.67
High School Graduation Rate	-0.45	-0.74	-0.73	-0.41	-0.58	-18.01	-8.51	-0.77
Second Education Rate	0.39	0.66	0.62	0.39	0.52	15.99	8.49	0.77

Steps for calculating the weights for each features:
1. Calculate the average of correlation coefficients (rj)
2. Rescale average correlation coefficients so that they sum up to the number of indicators:
     Rj= rj*D/S
     Where:
         D is the number of sub-indicators (11)
         S is the sum of the average correlation coefficients (rj)
3. Standardize Rescaled average correlation coefficients (Rj): constant unity-weights
     Wj= (Rj+1)/2
4. Final weights: we rescaled the weights so that they sum up to one
     Wj/sum(Wj)

Finally, we sum up values as an index for all features after applying weights to each variable for every census tract. Then, rescaled the index to the range from 0 to 100. A lower score means the census tract is having a not so good situation with child poverty.

(Please find Python code here.)